Empirical Text Mining for Genre Detection

نویسندگان

  • Vasiliki Simaki
  • Sofia Stamou
  • Nikos Kirtsis
چکیده

In this paper, we report on a preliminary study we carried out for identifying patterns that characterize the genre type of Greek texts. In the course of our study, we address four distinct genre types, we record their observable stylistic elements and we indicate their exploitation for automatic genre-based document classification. The findings of our study demonstrate that texts contain lexical features with discriminative power as far as genre is concerned, however modeling those features so that they can be explored by computer-based applications is still in early stages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genre-Based Stages Classification for Polarity Analysis

Polarity detection of Online Reviews is one of the most popular tasks related to Opinion Mining. Given that most state-of-the-art solutions ignore the structural aspects of a review, we present an approach to polarity detection that, first, distinguishes stages in the genre of hotel reviews and, subsequently, evaluates the usefulness of each type of stage in the determination of the polarity of...

متن کامل

Overview of the PAN/CLEF 2015 Evaluation Lab

This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problem...

متن کامل

Text genres in information organization

Introduction. Text genres used by so-called information organizers in the processes of information organization in information systems were explored in this research. Method. The research employed text genre socio-functional analysis. Five genre groups in information organization were distinguished. Every genre group used in information organization is described. Empirical evidence for genre gr...

متن کامل

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

Automatic Detection of Text Genre

As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012